vulkan: optimize Q2_K and Q3_K mul_mat_vec #10459

jeffbolznv · 2024-11-23T02:14:25Z

Similar to optimizations I did recently for Q{4-6}_K.

Perf results on RTX 4070:

before
| llama 8B Q2_K - Medium         |   2.95 GiB |     8.03 B | Vulkan     | 1000 |         tg128 |         59.12 ± 0.66 |
| deepseek2 16B Q2_K - Medium    |   5.99 GiB |    15.71 B | Vulkan     | 1000 |         tg128 |        102.23 ± 0.43 |

after
| llama 8B Q2_K - Medium         |   2.95 GiB |     8.03 B | Vulkan     | 1000 |         tg128 |         72.17 ± 0.51 |
| deepseek2 16B Q2_K - Medium    |   5.99 GiB |    15.71 B | Vulkan     | 1000 |         tg128 |        109.92 ± 0.62 |

I have read the contributing guidelines
Self-reported review complexity:
- Low
- Medium
- High

0cc4m

Looks good on Nvidia, AMD and Intel. Thank you.

vulkan: optimize Q2_K and Q3_K mul_mat_vec

29b273f

jeffbolznv requested a review from 0cc4m November 23, 2024 02:14

0cc4m approved these changes Nov 27, 2024

View reviewed changes

0cc4m merged commit 4a57d36 into ggml-org:master Nov 27, 2024
7 checks passed

stduhpf mentioned this pull request Dec 7, 2024

Eval bug: ~~Q2_K and Q3_K~~ Q8_0 not working on Vulkan anymore on RX 5700XT #10710

Closed

arthw pushed a commit to arthw/llama.cpp that referenced this pull request Dec 20, 2024

vulkan: optimize Q2_K and Q3_K mul_mat_vec (ggml-org#10459)

9bb00fe

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vulkan: optimize Q2_K and Q3_K mul_mat_vec #10459

vulkan: optimize Q2_K and Q3_K mul_mat_vec #10459

Uh oh!

jeffbolznv commented Nov 23, 2024

Uh oh!

0cc4m left a comment

Uh oh!

Uh oh!

Uh oh!

vulkan: optimize Q2_K and Q3_K mul_mat_vec #10459

vulkan: optimize Q2_K and Q3_K mul_mat_vec #10459

Uh oh!

Conversation

jeffbolznv commented Nov 23, 2024

Uh oh!

0cc4m left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!